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Abstract . 

In this paper we present some of the basic ideas associated with the 
detection of abrupt changes in dynamic systems. Our presentation focuses on 
two classes of methods -- multiple filter-based techniques and residual-based 
methods — and in far more detail on the multiple model and generalized 
likelihood ratio methods. Issues such as the effect of unknown onset time on 
algorithm complexity and structure and robustness to model uncertainty are 
discussed. 
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!♦ Introduction 

In recent years many techniques have been proposed for the detecti,pn of 
abrupt chnages in dynamic systems. Thoae efforts have been motivated by a 
wide variety of applications including the detection of sensor and actuator 
failures II, 2, 4, 19, 26~351 the tracking of maneuvering vehicles 120, 21, 

23, 25J, and numerous signal analysis problems (electrocardiogram analysis 
15, 6], geophysical signal processing (71, edge detection in images 18, 91, 
freeway monitoring [10, 11),»,.). a key to the development of any technique 
for the detection of abrupt changes is the modeling of how the abrupt change 
affects the observed signals. In some applications the effect of the abrupt 
change is direct and simple — e.g. a bias developing in an output signal. 

In such problems the primary focus of research is on the precise nature of 
the decision rule (see, for example [8, 9, 261). In other applications (such 
as those described in [1, 2, 4, 10, 11, 19, 211), the effect on the observe 
ables is described in a more complex, indirect way — for example, in terms 
of an abrupt change in the dynamics of a system. In such problems one is 
presented in essence with two problems: the processing of the observed sig- 

nals in order to accentuate (and simplify) the effect of the abrupt change 
and the definition of decision statistics and rules in terns of the processed 
outputs. The techniques described in this paper in principle address both 
of these issues in that they produce sufficient statistics for 
optimum detection. However, we will focus for the most part on the first 
task of change detection, that is, the problem of producing signals which make 
subsequent detection as easy as possible. As discussed here and in more 
detail in [27-29 ji this is an exceedingly important perspective in the design 
of detection methods which are robust to uncertain details of the dynamic 
models on which they are based. 

In [11 a variety of methods and structvures are described for change 
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detection^ In this papi&r Wti focus on two basic and ex'';reinely Important 
structures. The first of these is the multiple filter structure depicted 
in Piguri' 1, Here the observations, y, are processed by a bank of filters 
each or vh'/ch is based on a particular hypothesis (e,g. Filter #1 assusM$e no 
change has occurred, Filter #2 assumes a particular type of change has 
occurred possibly at a particular time, etc,) The outputs of the filters, Y> 
represent signals which should typically be small if the corresponding hypo- 
thesen are in fact correct, and thus the decision mechanism in essence is 
based on determining which of the filters is doing the “best*' job of keeping 
the corresponding y's small. There are several methods that have been de- 
veloped which fit the general form of Figure 1. In particular, hard C331 
and soft [34] voting systems can be interpreted in this fashion. Another 
example is the multiple observer design described in [36). In the next 
section we describe in detail a third technique of this general type, nidttely 
the multiple model method. 

A second general structure for the detection of abrupt changeu is the 
residual-based structure illustrated in Figure 2. In this case a filter is 
designed based on the assumption that no abrupt change has occurred or will 
occur. The filter produces a prediction y of the output signal y bas^d on 
this assumption and the past history of the output, and this predictioh is 
subtracted from the actual output to produce a residual signal y . If no 
abrupt change has occurred, Y should be small. Consequently deviations 
from this behavior are indicative of failure, and i*^ is on this fact that 
the decision mechanism is based. Again there 3i'f& a variety of techniqU'es 
of this general form, in (35) a variety of statistical tests (chi-'(«4uered, 
whiteness, etc.) are proposed for the detection of alirupt changes vrtien the 
Y are the innovations from a Kalman filter. In [30-32] a metht?d is described 
for the Choice of gain in an observer-like filter in order to guarantee that 
the decoupling of the steady-state effects oh y of a given set of possible 
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abrupt changes. In Section 3 wo discuss a third technique of this general 
type, namely the generalized likfilihood ratio method. 

2. The Multiple Model (MM) Method 

The MM method was originally developed for problems of system id<^r.tifi- 
cation and adaptive control [12-17, 24), and in the initial part of this 
section we follow these early treatments. Subsequently we will look more 
closely at the issues that arise and possible adaptations that nay be 
nece.-jsary for the use of MM for the detection of eibrupt events (see (i, 2, 5 
10, 18, 19, 22, 23] for further developments). 

The basic MM method deals with the following problem. We observe the 
inputs u(k) , k = 0,1,2,... and outputs y(k) , k - 1,2,... of a system which 
is assumed to obey one of a given finite set of linear stochastic models, 
indexed by i = 1,...,M: 

x^(K+l) = A. (k)x. (k) + B^(k)u(k) + w^(k) + g^(k) (2.1) 

y(k) = C^(k)x^(k) + + b£(k) (2.2) 

where Wj^(k) and v^(k) are independent, zero-mean Gaussian white noise pro- 
cesses, with 


E[w^ (k)w^(j)») 


(2.3) 

E[Vi(k)Vi(j)'l 


(2.4) 


The initial state x^(0) is assumed to be Gaussian, independent of w^ 
and Vj^, with mean x^(o|o) and covariance P^(o|o) (the meaning of this nota- 
tion will become clear in a moment). The matrices A^(k) , B^(k), C^^Ck) , 
Q^(k), and Hj|^(k) are assumed to be known. Also, bj|^(k) and gj^(H) are given 
deterministic functions of time (corresponding to biases, linearizations 
about different operating points, etc.). In addition, the state vectors 
jCj^(k) may be of different dimensions for different values of i (correspond- 
ing to assuming that the different hypothesized models represent different 

% 
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order;^ for tho dynamics of the real system) . llbere are a number of issues 
that can be raised concerning this formulation » and we defer our critique of 
the MM method until after we have developed its basic structure. We note 
here only one technical point which is that we will focus on a discrete~ 
time formulation of the MM method. Continuous-time versions can be found 
in the literature (see [24] )# and they differ from their discrete- time 
counterparts only in a technical and not in a conceptual or structural 
manner. 


Assuming that one of these N models is corrects we now have a standard 
multiple hypothesis testing problem. That is, let Hj. denote the hypothesis 
that the real system corresponds to the ith model, and let p^(0) denote the 
a priori probability that is true. Similarly, let denote the 

probability that is true based on measurements through the kth measure- 
ment, i.e. given = {u(0) , . . . ,u(k-)„) , y (1) , . . . ,y (k) } . Then Bayes* rule 

yields the following recursive formula for the p. (k) 

1 

p(y(k+l) |h. ,i. ,u(k))p. (k) 

Pi(k-fl) = i (2.5) 

^ p(y(k+l) |h.,I ,u(k) )p. (k) 


Thus, the quantities that must be produced at each time are the conditional 

probability densities p(y (k+1) jH. ,1. ,u(k) ) for i=l,..,,N. However, con- 

ditioned on this probability density is precisely the one step prediction 

density produced by a Kalman filter based on the ith model. 

That is, let x^(k+ljk) be the one-step predicted estimate of x^(k+l) 

based on Ij^ and u(k), assuming that is true. Also let Xj^(k+l|k+l) denote 

the filtered estimate of x. (ktl) based on I, = ,u(k) ,y (k+1) } and the ith 

1 k+l • K 

model. Then these quantities are computed sequentially from the following 
equations : 


x^(k+l|l) = A^(k)X^(k|k) + B^(k)u(k) + g^(k) 


( 2 . 6 ) 
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Ji^(k+l|k4-l) - Jl^(H+l|k) + K^(k+l)Y^(k+l) <2. 7) 

where Y^, (k+1) is the measurement innovations fi>;ocess 

Y^Oc+l) « y(k'fl) - C^(k)i«^(k+l|k) (2.8) 

and K(k+l) is calculated off-line from the following set of equations; 

Pj_(k+l|k) * A^(k)Pj^(kjk)A‘(k) + Q^(k) (2.9) 

V^(k+1) « C^(k)P^(kn|k)C’ (k) + R^(k) (2.10) 

K^(k+1) « P^(k+l|k)C|(k)V^^(k+l) (2.11) 

P^(k+l|k+l) s P^(k+l|k) - K^(k+l)C^(k)P^(k+l|k) (2.12) 


Here P^(k+ll)t) denotes the estimatior. error covarinace in the estiiDete 
kj^(k+l|k) (assuming to be true), and P£(k+l|k4'l) is the covariance of 
the error Xj,(k+1) - (fc+1 Ik+U » again based on Also under hypothesis 

is zero mean with covariance yj,(k+l). and it is normally dis- 
tributed (since we have assumed that all noises are Gaussian) . Furthermoref 
conditioned on and u(k). y(k+l) is Gaussian# has mean C^(k)il^(ktl|l!i) 

and covariance (k+1) . Thus, from (2.8) we deduce that 

X- 


p(y(k+l) |K^.Ij^,u(k) ) 


(2TT)"’''^ldetV^(k+l)l^^^ 


exp {- I YiOt+l)v“^(ktl) 

. Yi(k+1)} (2.13) 


where m is the dimension of y. 

Equations (2.5) - (2.8) and (2.13) define the MM algorithm* The inputs 
to the procedure are the y(k) and u(k). and the outputs are the p^(k). The 

implementation of the algorithm can be viewed as ct,>nsisting of a bank of H 

.1 

Kalman filters, one based on each of the N possible models. The outputs of 

' . ;Vj 

these Kalman filters are the innovations sequences y^(H+D» which effecti- 
vely measure h^>w well each of the filters can track and predict the behavior 
of the observed data. Specifically, if the ith model is correct# then the 
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ono-step prediction error should be a white sequence, resulting only 

from the intrinsic uncertainty in the ith model. However If the ith model 
is not correct, then Yj^(k) will not be white and will include errors due to 
the fact that the orediction is based on an erroneous mode I . Thus the pro- 
bability calculation (2,5), (2.13) basically provides a quantitative way in 
which to assess which model is most likely to be correct by comparing the 
performances of predictors based on these models. 

Let us now address several of the most important questions that arise 
in Understanding how the MM algorithm should be used. Clearly a very 
important* question concerns the usi* of MM in problems in which the real 
system is nonlinear and/or the noises are non-Gaussian. The answer to this 
problem is extremely application-dependent. The Gaussian assumption is 
basically used in one place— i.e. in the evaluation of p(y (k+1) jH^,lj^,u(k) ) 
in (2.13). It has been our experience that using this formula, even when 
Y^(k+1) is non-Gaussian, causes essentially no performance degradation. As 
we have pointed out, what MM really attempts bo do is to calculate a measure 
of how well each of the Kalman filters is tracking by looking at the predic- 
tion errors Y^(k+1) , and the p^(k) are simply measure of how well each of the 
models are tracking relative to each other and to how well we would expect 
them to be tracking. The critical term in (2.13) in general is 

Y^(k+l)v"^(k+l)Y^(k+l) (2.14) 

which is the square of the tracking error normalized by the predicted co- 
variance of these errors assuming is true. Thus if this quantity is 
large, we would tend to disregard the ith nK>del, while if this is small, the 
ith filter is tracking well. The Pj^(k) exhibit exactly this type of be- 
havior, and thus we can expect MM to be reasonably robust to non-Gaussian 
statistics. Of course this depends upon the application, but we have had 
good success in several applications [5, 10] in which the noises were 
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As far as the ronlinoarity of th>! real system is concerned^ an obvious 
^Ppi^<^ach is to linearize the system a))out a mtnber of operating points for 
each possible model and use these lint^arized models to design extended Kal- 
man filters which would be used in pl.ice of Kalman filters in the MM algor- 
ithm. Again the utility of this approach depends very much on the particu- 
lar application. Essentially the issue is whether the tracking errvsr from 
the extended Kalman filter corresponding to the linearized model "closest 
to" the true, nonlinear system is markedly smaller than the errors from 
filters based on "mere distant" models. This is basically a signal-.to-noise 
ratio problem, simiiar to that seen in the idealized MM algorithm in which 
everything is linear. In that case the noise is measured by the Vj^^fk+l) . 

The larger these are , the harder it will be to distinguish the models (the 
quantity in (2.14) tecomes smaller as is increased# and this in turn 
tends to flatten out (as a function of i) the probabilities in (2.13)). In 
the nonlinear case, the inaccuracies of the extended Kalman filters effecti- 
vely increase the (k-fl) thus reducing their tracking capabilities and 
making it more difficult to distinguish among them. Therefore, the perfor- 
mance of MM in this case will depend upon how "far apart" the different 
models are, as compered to how well each of the trackers tracks. The faf/ther 
apart the models are, the more vSignal we have; the 'poorer the tracking 
performance is, the more difficult it is to distinguish cunong the hypotheses. 

Even if the true system is linear, there is clearly the question of the 
utility of MM given the inevitability of discrepancies between the actual 
system and any of the N hypothesized models. Again this is a question of 
signal-to-noise ratio, but in the linear case a number of results and ap- 
proaches have been developed for dealing with this problem. For example, 
Bram [16] has developed a precise mathematiciil procedure for calculating 
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the distance between different linear modeler and he has shown that the MM 
procedure will converge to the model closest to the real model (i.o. 
for the model nearest the true system! . This can be viewed as a technique 
for testing the robustness of MM or as a tool that enables us to decide what 
models to choose. That is, if the real system is in some set of models that 
may be infinite or may in fact represent a continuum of models (corresponding 
to the precise values of certain parameters), then Baram's results can be 
used to decide upon a finite set of these models that span the original set 
and that are far enough apart so that MM can distinguish among them. For 
example^ in adaptive flight control (reference [17J) we may be interested 
in determining the flight condition (operating point) of an aircraft, and 
we can think of using MM by hypothesizing a set of linearized models that 
span the flight envelope. 

Let us now turn explicitly to the problem of detecting abrupt changes. 

In such problems one must deal with one key issue that we have not yet 
discussed. Specifically, in change detection we are not simply attempting 
to determine which of the models given in (2.1) - (2.4) is the correct one, 
but rather we are trying to detect a shift from one model to another. That 
is, in this case the actual system obeys a model of the form 

x(k+l) «A(k)x(k) + B(k)u(k) + w(k) + g(k) (2.15) 

y(k) e C(k)x(k) 4- vtk) *f b(k) ' (2.16) 

where for each k the parameters of the model correspond to one of the hy- 
pothesized models in (2.1) - (2.4), but the model may change with time . 

While this possibility is not diiectly taken into account in the MM method 
as described to this point, this algorithm often does work well in detecting 
shifts without any major modification to take this possiblity into account 
(see, for example (5, lOj . The important issue in this is the adaptability 
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of MM rnd the purpose i of tho particular application. 

To elaborate on his, let us first note that MM will, theorette^lly, 

eventually indicate a shift from one model to another. Two thincfs, however, 

must be taken into acjount. In the first place, we see front (2,5) that if 

Pjl^(k) is small, the p (k+1) wil’ grow only slowly at best. In fact^ in 

practice we have found that numerical roundoff often leads to Pj.(k) being 

set to zero if the ith model is not valid up to time k. In this case 

will be zero for all j > k. In order to avoid this drastic effect and also 

the extremely sluggis i response of MM to a chamre in models, a lower bound 

is usually set or^ the pj^(k) . In different applications we have found bounds 
“3 "• 5 

ftom 10 down to 10 ’ to be satisfactory, with very little sensitivity to 
the precise value of ;he bound. As a second point we note that if a parti- 
cular model is not correct up until time k the Kalman filter based on this 
model may develop larje errors. If then this model becomes correct at time 
kf it may take a long time before the prediction errors (2.8) decrease to 
reflect the validity if the model. Prom (2.13) and (2.5) we see that this 
in turn means that MM may not rr spend to this change for some timci In pra- 
tice we have found that this is not a particularly bad problem if the errors 
in all of the Kalman filters rei ain bounded even when the model on which they 
are based is incorrect. If a p< rticular real system-mismatched Kalman fil- 
ter combination is unstable, then there may be problems if the system switch- 
es to the model corresponding tc. this filter, What we have found is a 
workable solution to this problem is to reset the estimates of potentially 
divergent Kalman filters to the estimate of the most probable model, and 

this is done whenaver the probability of possibly diverging filters falls 

-2 

below a threshold (such as 10 ) . 

With these modifications MM will respond more quickly to model changes. 
Wheth^.t this is adequate depends upon the application. In particular, if 
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fast rosponsa is for control purposes or because additional model 

shifts are possible, then one may wish to consider a problem formulation 
that explicitly includes model switches, Purthermoro; in some applications 
the time at which a shift occurs is excoedingly important, and in such a 
case one may again prefer to use such an explicit formulation, as one must 
in applioations such as multi-object tracking f37] in which keeping track 
of large numbers of possibilities is crucial. 

In the next section we describe one such formulation, and in the 
remainder of this section we indicate how the MM formulation can be modifier 
to incorporate model changes and what the cost is for this modification. 
Specifically suppose that the real system does correspond to one of the 
models (2,1) - (2,4) for each k but that the model may change from time to 
time, gicarly there are several different constraints that we can place on 
the possible sequences of models. For excunple, if there are no constraints, 
then there are N possible sequences of models over the first k time steps 
(any of N at k»0, any of N at t»«l,.,,). Such a situation arises, for ex- 
ample, if one assumes that the sequence of models is governed by a finite- 
state Markov processes, Such models have been considered by several authors. 
See for example (40-42] in which, in addition to considering the problem of 
estimation, these authors also consider the problem of identifying the 
transition probability matrix for the finite-state process. 

On the other hand, in many problems one is interested in detecting 
individual abrupt changes which are sufficiently separated in time so that 
they can be detected and accounted for separately. In such a case it is 
reasonable to allow only those sequences that start with one particular 
model (the "normal'* model) and have a single shift to any of the other 
models. In this case there are (kN-k+1) possible sequences up to time k — 
essentially we must account for all possible failure times. 


ORIGINAL PAQf fg' 

OF POOR QUALITY 

The MM solution for any such set of jpossiblo sequences of models is 
conceptually identical to that discussod previously, except here in principle 
we must design a Kalman filter for each allowable sequence of models. The 
residuals from these filters aro then used exactly as described earlier to 
compute the probabilities for all hypothesised sequences, since the nanAier 
of possible sequences and thus filters grows in time, some method for prun- 
ing the tree of hypotheses is needed, For i•xample, wo can think of throw- 
ing away very unlikely mode^ls, A variety of toehnlques for handling such 
MM trees have been considered in the literanure 118, 19, 371* While this 
may at first glance appear to be a hopelessly complex solution to the change 
detection problem, this approach is not without merit* Specifically, as in 
1193 this approach often provides a groat deal of insight. Also, the imple- 
mentation of Kalman filter trees is not only within the realm of feasibility 
for implementation using high speed digital hardware, but It is also un- 
avsidsbilifr problems such as multi-object tracking. 

3, The Generalised Likelihood Ratio (GLR) Method 

The starting point for the GLR method is a model describing normal 
operation of the observed signals or of the system which generated them. 
Abrupt changes are then modeled as additive disturbances to this model that 
begin at unknown times. While there are strong similarities between the GLR 
and MM formulations — indeed in many cases one can use either approach with 
success — the structure of the GLR algorithm is significantly different 
than that for the MM technique. As just discussed for Mt4, we will look at 
the case of a single such change, the assumption being that abrupt changes 
are sufficiently separated to allow for individual detection and compensa- 
tion. The solution to the problem just described and applications of the 
method can be found in .tl, 3. 5, 10, 20, 21, 25], In this section we outline 
the basic ideas behind the technique and discuss some of its properties. 
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We assume that the system under considoration can be modeled as 


x(k+l) « A(k)x(k) + B(k)u(k) + w(k) + f^(k,G)v (3.1) 

y(k) « C(k)x(k) + v(k) + fj^{k,0)\* (3,2) 


where the ncirmal model consists of these equations without the and 
terms. These terms, f^(k,6)v and q^(k,B)v, represent the presence of the 
ith type of abrupt change, i*l,.,. ,N* Here 0 is the unknown time at which 
the failure occurs (so f^ (k,G) » g^(k,0) » 0 for k < 9), and f^ and g^ are 
the specified dynamic profiles of the ith charge type. For example, if 
fj,»0 and gj^*a vector whoso components are all zero except for the jth one 
which equals 1 for k > 0, then this corresponds to the onset of a bias in 
the jth component of y. Finally, the scalar denotes the magnitude of the 
failure (e.g, the size of a bias) which we can model as known (as in MM 
and as in what is called simplifie d GLR (SGLR) ) or unknown. 

Assume that wo design a Kalman filter based on normal operation, i.e. 
by neglecting f^^ and g^. From the previous section we have that this filter 
is given by 

x(k+llk) = A(k)x(kjk' + B(k)u(k) (3.3) 

x(k+l|k-H) x(k-fl|k) + K(k+l)Y(k+l) (3.4) 

7(k+l) « y(k+l) - C(k)x(k+l|k) (3.5) 


where K, P, and V are calculated as in (2.9) - (2.12). Suppose now that a 
type i change of size v occurs at time 0. Then, because of the linearity 
of (2.1) “ (3.5) we can write 


x(k) = Xj^(k) + a^(k,0)v 

(3.6) 

x(k|k) = x„(k|k) + p (k,0)v 

N X j 

(3.7) 

>i(k+l|k) = x^(k+l]k) + p^(k+l,0)v 

(3.8) 

Y(k) = + pj^(k,0)V 

(3.9) 


where x , x , and Y„ are the responses if no abrupt change occurs, and the 
N N N 
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other terms are the responses due solel y to the abrupt .phange. Straight- 
forward calculations yield recursive equations for theses quantities: 


a^(k+l,0) 

* A(k)rt^(k,0) + f^(k,0)> a^(0,O) ^ o 

(3,10) 

3i(k+l,0) 

* II-K(k+l)C(k4-l))y^(k'fl,G) + K(k+1)* 



'(C(k+l)rx^(ka,e) + g^(k+l,0)l . 

(3.11) 

M^(k+X,e) 

» A(k)3. (k,0) , (0-1,0) 0 

(3.12) 

pj.(k,6) « 

C(k)la^(k,0) - ]h(k,0)3 -f g^(k,0) 

(3.13) 


The important point about these quantities is that they can be pre- 
computed. Furthermore, by its definition, Y„(k) is the innovations under 
normal conditions, i.e. it is zero-mean, white, Gaussian with covariance 
V(k). Thus we now have a standard detection problem in white noise: we 
observe the filter residuals Y(k), which can be modeled as in (3.9), .and we 
want to detect the presence of a change (i.e. that k > 0) and perhaps de- 
termine its identity i and estimate its time of occurrence 0 and size V, 
if the latter is modeled as being unknown. The solution to this problem 
involves matched filtering operations. First, define the precomputable 
quantities 

^ -1 

a(k,0,i) = i: p! (j,0)V ■"(j)p (j,0) (3.14) 

j-0 ^ ^ 

This has the interpretation as the amount of information present in 
y(0),..,y(k) aktout a type i change occurring at time 0. 

The on-line GLR calculations consist of /the calculation of 

^ -1 

d(k,0,i) = E P!(j,0)V ^(j)Y(j) (3.15) 

j»e ^ 

which are essentially correla|:ions of the observed residuals with the 
abrupt change signatures p^(j,0) for different hypothesized types, i, and 

f , ^ 

times, 0. If U is known (the SGLR case), then the likelihood of a type i 
change having occurred at time 0 given data y (1) , . , . ,y (k) is 
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e (k,0,i) » 2Vd(k,0,i) - V a(k,0,i) 
s 

If V is unknown, then the generalized likelihood for thus change is 


(3.16) 


n(k,e,i) » 


d (k,0»i) 


a(k,G,i) 

and the maximum likelihood estimate of v assuming a change of type i at 
time 0 is 


(3.17) 


n -n d(k.e.i) 
' a(k,e,i) 


(3.18) 


Thus the GLR algorithm consists of the single Kalman filter (3.3) - 
(3.5), the matched filter operations of (3.15),, and the likelihood calcu- 
lation of (3.16) or (3.17), The outputs of the method are these likeli- 
hoods and the estimates of eg. (3,18) if V is modeled as unknown. The 
basic idea behind G1.R is that different types of abrupt changes produce 
different kinds effects on the filter innovations — i.e. different 
signatures — and GLR calculates the likelihood of each possible event by 
correlating the innovations with the corresponding signature. 

As with the MM method a number of issues can be raised about GLR. 

Some of these, such as tht> effect of nonlinearities and robustness to model 
errors, are very similar to the MM case. Essentially it still can be viewed 
as a signal-to-noise ratio problem; in the nonlinear case the additive de- 
composition of (3.9) is not precisely valid, but it may be approximately 
correct. Also, different failure modes can be distinguished even in the pre*^ 
sehce of modelling errors if their signatures are different enough. Again 
these issues depend very much on the particular application. We refer the 
reader to [4, 6, 10, 11, 21, 25 J for discussions of several applications of 
GLR to applications in which these issues had to be addjessed. 

GL^l has been successfully applied to a wide variety of applications, 
such as failure detection [1, 4], geophysical signal analysis [7], detecting 
arrhythmias in electrocardiograms [6], freeway 'dent detection [10, 11), 
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aiid maneuver detection 120, 21, 251 . Note Uiat the model used in (3.1) , 

(3.2) for such changes is an additive model, Thus it appears on the surface 
that the types of abrupt changes that can be detected by GLR are a special 
siabset of those that can bo detected by MM, since (2.1), (2.2) allow para- 
metric* changes (in A, B, G, Q, R) as well an additive ones. There are 
several joints, however, that must be ta)cen Into account in assessing and 
comparing MM and GLR: 

(1) The price one pays for allowing pnrametric changes in MM is the 

, necessity of implementing banks* of Ki'ilman filters , and actually 

trees of such filters to account ^'or switches between models. GLR, 
on the other hand, requires a single Kalman filter and a growihg 
number of correlation calculations as in (3.15), which in principle 
must be calculated for i5=l,...,N and 6s=l,.».,k. We Will conment 
shortly on the computational issucis concerned with these correla- 
tions, but for now we simply point put that they are typically far 
less involved than the calculations inherent in Kalman filters 
(see [4, 6, 71 for examples of how simple these calculations can 
be) . Also, because it operates on the outputs of a normal mode 
filter, GLR can be easily implemented and attached as a monitor to 
an already existing system. 

(2) Extensions to the GLR method can be developed for the detection of 
parametric changes (381. This extended GLR bears some similarity 
to extended Kalman filtering and iterated extended Kalman filtering. 

(3) It has been our experience that a GLR system based on the detection 
of additive effects can often also detect parameter failures. For 
example, a gain change in a sensor does look like a sensor bias, 
albeit one that is modulated by the value of the variable being 
sensed. That is, ahy detectable change will exhibit a systematic 
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deviation between what is obBi-rvcd and wliat is predicted to be 
observed. Obviously, the ability of GLR to detect a parametricJ 
change when it is looking for additive ones is again a guestion 
of robustness. If the effect of the parametric change is "close 
enough*' to that of the additive one, the system will work. This 
has been the case in all of our experience. In particular we 
refer the reader to [4] for an additive-failure-based design that 
has done extremely well in detecting gain changes in sensors. Note 
of course that in this mode GLR is essentially only indicating 
an alam ■— i.e. the estimate V of the "bias" is meaningless, but 
in many detection problems our primary interest is in simply 
identifying which of several types of changes has occurred. 

There are several final issues that should be mentioned in discussing 
GLR. The first concerns the calculation of statistical measures of per- 
formance of GLR. As mentioned in the preceding section, Baram [16] has 
developed a method for measuring the distance between models and hence a 
measure of the detectability and distinguishaV ility of different failure 
modes. Similar calculations can be performed for GLR, but in this case it is 
actually simpler to do and interpret, as we can use standard detection- 
theoretic ideas. Specifically, a direct meas'v, e of the detectability of a 
particular type of change is the information ci(k,9,i) defined in (3.14) . 

This quantity can be viewed as the correlation of p^(j,G) with itself .at 
zero lag. Similarly, we can determine the relative distinguishability of a 
type i change, at two times 0^^ and as the correlation of the corresponding 
signatures 

^ -1 

a(k,6 .0. ,i) = Z p!(j,0,)V ^j)p. <j,ej (3.19), 

^ 2 j=n,ax(e^, 62 ) ^ 


t 
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and the rolativ® distinguishability of type i and m changes at times 0^ and 
Og similarly} ^ 

^ -1 

a(k,0 ,9 ,i,m) * t ^(j)p^(j,0.) (3.20) 

^ ^ j-max(0^,02) ^ ^ ™ 2 

These quantities provide us with extremely \iseful information. For example? 
in some applications [6-9) the estimation of the time 0 at which the change 
occurs is critical, and (3.19) provides information about how well one can 
resolve the onset time. In failure detection applications these quantities 
directly provide us with information about how system redundancy is used to 
detect and distinguish failures and can be used in deciding whether addition- 
al redundancy (e.g. more sensors) are needed. Also, the quantities in (3.14) , 
(3.19), and (3.20) directly give the statistics of the likelihood measures 

(3.16), (3.17). For the SGLR case of (3,16), 9, is Gaussian, and its mean 

s 

2 

under no failure is -V a{k,9,i), while if a type m failure occurs at time (J), 
its mean is 

E[A- 2 (*^f 0 fi) I (n}r<t') ) == V^[2a(k,0,t»irm) -a(k,0,i)] (3.21) 

For example if (m,?-) - (i,6) — i.e. if the precise failure and time assvimed 

2 

in the calculation of I (k,0,i) are trucf then its mean is +V a(k,6,i). In 

s 

the case of (3.17) , under no failure (t(k,G,i) is a chi-squared random vari- 
able with 1 degree of freedom, while if a failure (m,<()) of size V occurs 
£.(k,9,i) in non-central chi-squared with mean 

E[H(k,e,i)|ta,4')! = 1 + — (3-22> 

Clearly these qucuitities can be very useful in evaluating the performance of 
GLR detection algorithms a^d for determining decision rules based on the 
GLR outputs. If one were to follow the precise GLR philosophy '(391, the 
decision rule one would use is to choose at each time k the largest of the 
A (k,0,i) or J.(k,0,i) over all possible change types i and onset times 6. 
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This largest value would then be comparod to a threshold for change detec- 
tion, and if the threshold is exceeded the corresponding maximizing values 
of 0 and i are taken as the estimates of change type and time. While such 
a simple rule works in some cases 16, 21.1, it is worthwhile often to consider 
more complex rules based on the H's. For example, persistance tests (i.e. 

£ must exceed the threshold over some time period) are often used to cut 
down on false alarms due to spurious and unmoduled events. See [A, 7, 9, 26] 
for more discussion of decision rules. 

A final issue to be mentioned is the pruning of the tree of possibi- 
lities. As in the MM case in principle we have a growing number of calcu- 
lations to perform, as d(k,0,i) must be calculated for i=l,...,N and all 
\ 

possible change times up to the present, i.e. 0«l,...,k. What is usually 
done is to look only over a sliding window of possible times: 

' k - Mj ^ 5 ® 5 ^^“^2 ( 3 . 23 ) 

where and are chosen based on the a's — i.e, on detectability and 
distingUishability considerations. Basically after M 2 times steps from the 
onset of change we have collected enough information so that we may make 

* I 

a detection with a reasonable amount of accuracy. Further, after M, time 

1 1 

steps we will have collected a sufficient aiTiount of information so that 
detection performance is as good as it can be (i.e. there is no point in 
waiting any longer) , Clearly we want M 2 large to allow for maximum 
information collection, but we want them small for fast response and for 
computational simplicity. This is a typical tradeoff that arises in all 
change detection problems. 
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